Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analysis. In this paper, we pioneer evaluating recent MLLMs (LLaVA 1.5, LLaVA-NeXT, BLIP2, InstructBLIP, GPT-4V, and GPT-4o) on datasets designed to address weaknesses in earlier ones. We assess three VQA datasets: 1) TDIUC, which permits fine-grained analysis on 12 question types; 2) TallyQA, which has simple and complex counting questions; and 3) DVQA, which requires optical character recognition for chart understanding. We also study VQDv1, a dataset that requires identifying all image regions that satisfy a given query. Our experiments reveal the weaknesses of many MLLMs that have not previously been reported. Our code is integrated into the widely used LAVIS framework for MLLM evaluation, enabling the rapid assessment of future MLLMs.more » « lessFree, publicly-accessible full text available June 11, 2026
-
Free, publicly-accessible full text available April 1, 2026
-
Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit natural random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable of producing spatially-varying noise blends despite not having access to such data for training. These features are enabled by training a denoising diffusion model using a novel combination of data augmentation and network conditioning techniques. Like procedural noise generators, the model's behavior is controllable via interpretable parameters plus a source of randomness. We use our model to produce a variety of visually compelling noise textures. We also present an application of our model to improving inverse procedural material design; using our model in place of fixed-type noise nodes in a procedural material graph results in higher-fidelity material reconstructions without needing to know the type of noise in advance. Open-sourced materials can be found at https://armanmaesumi.github.io/onenoise/more » « less
-
Abstract Despite the ubiquitous use of materials maps in modern rendering pipelines, their editing and control remains a challenge. In this paper, we present an example‐based material control method to augment input material maps based on user‐provided material photos. We train a tileable version of MaterialGAN and leverage its material prior to guide the appearance transfer, optimizing its latent space using differentiable rendering. Our method transfers the micro and meso‐structure textures of user provided target(s) photographs, while preserving the structure and quality of the input material. We show our methods can control existing material maps, increasing realism or generating new, visually appealing materials.more » « less
-
Abstract Procedural models (i.e. symbolic programs that output visual data) are a historically‐popular method for representing graphics content: vegetation, buildings, textures, etc. They offer many advantages: interpretable design parameters, stochastic variations, high‐quality outputs, compact representation, and more. But they also have some limitations, such as the difficulty of authoring a procedural model from scratch. More recently, AI‐based methods, and especially neural networks, have become popular for creating graphic content. These techniques allow users to directly specify desired properties of the artifact they want to create (via examples, constraints, or objectives), while a search, optimization, or learning algorithm takes care of the details. However, this ease of use comes at a cost, as it's often hard to interpret or manipulate these representations. In this state‐of‐the‐art report, we summarize research on neurosymbolic models in computer graphics: methods that combine the strengths of both AI and symbolic programs to represent, generate, and manipulate visual data. We survey recent work applying these techniques to represent 2D shapes, 3D shapes, and materials & textures. Along the way, we situate each prior work in a unified design space for neurosymbolic models, which helps reveal underexplored areas and opportunities for future research.more » « less
-
Abstract Conventional rendering techniques are primarily designed and optimized for single‐frame rendering. In practical applications, such as scene editing and animation rendering, users frequently encounter scenes where only a small portion is modified between consecutive frames. In this paper, we develop a novel approach to incremental re‐rendering of scenes with dynamic objects, where only a small part of a scene moves from one frame to the next. We formulate the difference (or residual) in the image between two frames as a (correlated) light‐transport integral which we call the residual path integral. Efficient numerical solution of this integral then involves (1) devising importance sampling strategies to focus on paths with non‐zero residual‐transport contributions and (2) choosing appropriate mappings between the native path spaces of the two frames. We introduce a set of path importance sampling strategies that trace from the moving object(s) which are the sources of residual energy. We explore path mapping strategies that generalize those from gradient‐domain path tracing to our importance sampling techniques specially for dynamic scenes. Additionally, our formulation can be applied to material editing as a simpler special case. We demonstrate speed‐ups over previous correlated sampling of path differences and over rendering the new frame independently. Our formulation brings new insights into the re‐rendering problem and paves the way for devising new types of sampling techniques and path mappings with different trade‐offs.more » « less
-
Abstract Precomputed Radiance Transfer (PRT) remains an attractive solution for real‐time rendering of complex light transport effects such as glossy global illumination. After precomputation, we can relight the scene with new environment maps while changing viewpoint in real‐time. However, practical PRT methods are usually limited to low‐frequency spherical harmonic lighting. All‐frequency techniques using wavelets are promising but have so far had little practical impact. The curse of dimensionality and much higher data requirements have typically limited them to relighting with fixed view or only direct lighting with triple product integrals. In this paper, we demonstrate a hybrid neural‐wavelet PRT solution to high‐frequency indirect illumination, including glossy reflection, for relighting with changing view. Specifically, we seek to represent the light transport function in the Haar wavelet basis. For global illumination, we learn the wavelet transport using a small multi‐layer perceptron (MLP) applied to a feature field as a function of spatial location and wavelet index, with reflected direction and material parameters being other MLP inputs. We optimize/learn the feature field (compactly represented by a tensor decomposition) and MLP parameters from multiple images of the scene under different lighting and viewing conditions. We demonstrate real‐time (512 x 512 at 24 FPS, 800 x 600 at 13 FPS) precomputed rendering of challenging scenes involving view‐dependent reflections and even caustics.more » « less
-
ImportanceAbortion bans may lead to births among those who are unable to overcome barriers to abortion. The population-level effects of these policies, particularly their unequal impacts across subpopulations in the US, remain unclear. ObjectiveTo assess heterogeneity in the association of abortion bans with changes in fertility in the US, within and across states. Design, Setting, and ParticipantsDrawing from birth certificate and US Census Bureau data from 2012 through 2023 for all 50 states and the District of Columbia, this study used a bayesian panel data model to evaluate state-by-subgroup-specific changes in fertility associated with complete or 6-week abortion bans in 14 US states. The average percent and absolute change in the fertility rate among females aged 15 through 44 years was estimated overall and by state, and within and across states by age, race and ethnicity, marital status, education, and insurance payer. ExposureComplete or 6-week abortion ban. Main outcome and MeasuresFertility rate (births per 1000 reproductive-aged females) overall and by subgroups. ResultsThere were an estimated 1.01 (95% credible interval [CrI], 0.45-1.64) additional births above expectation per 1000 females aged 15 through 44 years (reproductive age) in states following adoption of abortion bans (60.55 observed vs 59.54 expected; 1.70% increase; 95% CrI, 0.75%-2.78%), equivalent to 22 180 excess births, with evidence of variation by state and subgroup. Estimated differences above expectation were largest for racially minoritized individuals (≈2.0%), unmarried individuals (1.79%), individuals younger than 35 years (≈2.0%), Medicaid beneficiaries (2.41%), and those without college degrees (high school diploma, 2.36%; some college, 1.58%), particularly in southern states. Differences in race and ethnicity and education across states explain most of the variability in the state-level association between abortion bans and fertility rates. Conclusion and RelevanceThese findings provide evidence that fertility rates in states with abortion bans were higher than would have been expected in the absence of these policies, with the largest estimated differences among subpopulations experiencing the greatest structural disadvantages and in states with among the worst maternal and child health and well-being outcomes.more » « lessFree, publicly-accessible full text available April 15, 2026
-
Abstract Procedural modeling allows for an automatic generation of large amounts of similar assets, but there is limited control over the generated output. We address this problem by introducing Automatic Differentiable Procedural Modeling (ADPM). The forward procedural model generates a final editable model. The user modifies the output interactively, and the modifications are transferred back to the procedural model as its parameters by solving an inverse procedural modeling problem. We present an auto‐differentiable representation of the procedural model that significantly accelerates optimization. In ADPM the procedural model is always available, all changes are non‐destructive, and the user can interactively model the 3D object while keeping the procedural representation. ADPM provides the user with precise control over the resulting model comparable to non‐procedural interactive modeling. ADPM is node‐based, and it generates hierarchical 3D scene geometry converted to a differentiable computational graph. Our formulation focuses on the differentiability of high‐level primitives and bounding volumes of components of the procedural model rather than the detailed mesh geometry. Although this high‐level formulation limits the expressiveness of user edits, it allows for efficient derivative computation and enables interactivity. We designed a new optimizer to solve for inverse procedural modeling. It can detect that an edit is under‐determined and has degrees of freedom. Leveraging cheap derivative evaluation, it can explore the region of optimality of edits and suggest various configurations, all of which achieve the requested edit differently. We show our system's efficiency on several examples, and we validate it by a user study.more » « less
An official website of the United States government
